Please take a look at \hyperlink{single__machine__usage}{Single Machine Usage} for information on running individual commands. Here we give you ways to run those individual commands on multiple machines. So, we are not repeating the details on the individual commands. 
\begin{DoxyEnumerate}
\item 

{\bfseries Using Hadoop} 
\begin{DoxyEnumerate}
\item 

Run Y!LDA on the corpus: 
\begin{DoxyEnumerate}
\item 

Assuming you have a homogenous setup, install Y!LDA on one machine. 
\item 

Run make jar to create LDALibs.jar file with all the required libraries and binaries 
\item 

Copy LDALibs.jar to HDFS 
\item 

Copy Formatter.sh LDA.sh functions.sh runLDA.sh to gateway. A gateway is any machine with access to the grid. Think of it as a machine from which you run your hadoop commands. 
\item 

Figure out the max memory allowed per map task for your cluster and use the same in the script via the maxmem parameter. This can be done by looking at any job conf (job.xml) and searching the value of "mapred.cluster.max.map.memory.mb" property. 
\item 

Run {\ttfamily runLDA.sh 1 "flags" \mbox{[}train$|$test\mbox{]} "queue" "organized-\/corpus" "output-\/dir" "max-\/mem" "number\_\-of\_\-topics" "number\_\-of\_\-iters" "full\_\-hdfs\_\-path\_\-of\_\-LDALibs.jar" "number\_\-of\_\-machines" \mbox{[}"training-\/output"\mbox{]} }\par
\par
 For train ex. {\ttfamily runLDA.sh 1 "" train default "/user/me/organized-\/corpus" "/user/me/lda-\/output" 6144 100 100 "LDALibs.jar" 10 }\par
\par
 For test ex. {\ttfamily runLDA.sh 1 "" test default "/user/me/organized-\/corpus" "/user/me/lda-\/output" 6144 100 100 "LDALibs.jar" 10 "/user/me/lda-\/output" }  
\item 

This starts 2 Map Reduce Streaming jobs. The first job does the formatting \& the second job starts a map-\/only script on each machine. This script starts the \hyperlink{class_d_m___server}{DM\_\-Server} on all the machines. Then Y!LDA is run on each machine. The input is one chunk of the corpus. The first job runs the formatter in the reducer using the supplied number\_\-of\_\-machines as the number of reduce tasks. So your corpus will be split into number\_\-of\_\-machines parts and formatted into protobuffer format files as needed by the second job. The second job uses this formatted input as its input and will also run on number\_\-of\_\-machines separate machines with each task working on one chunk. 
\item 

For testing, use the test flag and provide directory storing the training output. 
\end{DoxyEnumerate}
\item 

Output generated 
\begin{DoxyEnumerate}
\item 

Creates $<$number\_\-of\_\-machines$>$ folders in $<$output-\/dir$>$ one for each client.  
\item 

Each of these directories hold the same output as the single machine case but from different clients.  
\item 

$<$output-\/dir$>$/$<$client-\/id$>$/learntopics.WARNING contains the output written to stderr by client $<$client-\/id$>$ 
\item 

$<$output-\/dir$>$/$<$client-\/id$>$/lda.docToTop.txt contains the topic proportions assigned to the documents in the portion of the corpus alloted to client $<$client-\/id$>$ 
\item 

$<$output-\/dir$>$/$<$client-\/id$>$/lda.topToWor.txt contains the salient words learned for each topic. This remains almost same across clients. So you can pick one of these as the salient words per topic for the full corpus. 
\item 

$<$output-\/dir$>$/$<$client-\/id$>$/lda.ttc.dump contains the actual model. Even this like the salient words is almost same across clients and any one can be used as the model for the full corpus. 
\item 

$<$output-\/dir$>$/global contains the dump of the global dictionary and the partitioned gobal topic counts table. These are generated in the training phase and are critical for the test option to work. 
\end{DoxyEnumerate}
\item 

Viewing progress 
\begin{DoxyEnumerate}
\item 

The stderr output of the code will be redirected into hadoop logs. So you can check the task logs from the tracking URL displayed in the output of runLDA.sh to see what is happening 
\end{DoxyEnumerate}
\item 

Failure Recovery 

We provide a check-\/pointing mechanism to handle recovery from failures. The current scheme works in local mode and for distributed mode using Hadoop. The reason for this being that the distributed check-\/pointing uses the hdfs to store the check-\/points. The following is the process:  
\begin{DoxyEnumerate}
\item The formatter task is run on the inputs and the formatted input is stored in a temporary location. 
\item The learntopics task is run using the temporary location as an input and the specified output as the output directory. Care is taken to start the same number of mappers as the number\_\-of\_\-machines for learntopics tasks. The input is a dummy directory structure with dummy directories equal to the number\_\-of\_\-machines parameter supplied by the user. 
\item Each learntopics task copies its portion of the formatted input by dfs copy\_\-to\_\-local the folder corresponding to its mapred\_\-task\_\-partition. 
\item Runs learntopics with the temporary directory containing the formatted input as a check-\/point directory. So all information needed to start learntopics from the previous check-\/pointed iteration is available locally and any progress made is written back to the temporary input directory. 
\end{DoxyEnumerate}

This mechanism is utilized by the scripts to detect failure cases and attempt to re-\/run the task again from the previous checkpoint. As learntopics is designed to check if check-\/point metadata is available in the working directory and use it to start-\/off from there a separate restart option is obviated.  

As a by product one gets the facility of doing incremental runs, that is, to run say 100 iterations, check the output and run the next 100 iterations if needed. The scripts detect this condition and ask you if you want to start-\/off from where you left or restart from the beginning. 

The scripts are designed in such a fashion that these happen transparently to the user. This is information for developers and for cases where the recovery mechanism could not handle the failure in the specified number of attempts. Check the stderr logs to see what the reason for failure is. Most times it is due to wrong usage which results in unrecoverable aborts. If you think its because of a flaky cluster, then try increasing the number of attempts. If nothing works and you think there is a bug in the code please let us know. 
\end{DoxyEnumerate}
\end{DoxyEnumerate}
\begin{DoxyEnumerate}
\item 

{\bfseries Using SSH -\/ Assume you have 'm' machines} 
\begin{DoxyEnumerate}
\item 

If you have a homogenous set up, install Y!LDA on one machine, run make jar and copy LDALibs.jar to all the other machines in the set up. Else install Y!LDA on all machines. 
\item 

Split the corpus into 'm' parts and distribute them to the 'm' machines 
\item 

Run formatter on each split of the corpus on every machine. 
\item 

Run the Distributed\_\-Map Server on each machine as a background process using nohup: 

{\ttfamily nohup ./DM\_\-Server $<$model$>$ $<$server\_\-id$>$ $<$num\_\-clients$>$ $<$host:port$>$ -\/-\/Ice.ThreadPool.Server.SizeMax=9 \&} 
\begin{DoxyEnumerate}
\item 

model: an integer that represents the model. Set it to 1 for \hyperlink{class_unigram___model}{Unigram\_\-Model} 
\item 

server\_\-id: a number that denotes the index of this server in the list of servers that is provided to 'learntopics'. If server1 has h:p, 10.1.1.1:10000 \& is assigned id 0, server2 has h:p, 10.1.1.2:10000 \& is assigned id 1, the list of servers that is provided to 'learntopics' has to be 10.1.1.1:10000, 10.1.1.2:10000 and not the other way around. 
\item 

num\_\-clients: a number that denotes the number of clients that will access the Distributed Map. This is usually equal to 'm'. This is used to provide a barrier implementation 
\item 

host:port-\/ the port and ip address on which the server must listen on 
\end{DoxyEnumerate}
\item 

Run Y!LDA on the corpus: 
\begin{DoxyEnumerate}
\item 

On every machine run 

{\ttfamily learntopics -\/-\/topics=$<$topics$>$ -\/-\/iter=$<$iter$>$ -\/-\/servers=$<$list-\/of-\/servers$>$ -\/-\/chkptdir="/tmp" -\/-\/chkptinterval=10000} 
\begin{DoxyEnumerate}
\item 

$<$list-\/of-\/servers$>$: The comma separated list of ip:port numbers of the servers involved in the set-\/up. The index of the ip:port numbers should be as per the server\_\-id parameter used in starting the server 
\item 

chkptdir \& chkptinterval: These are currently used only with the Hadoop set-\/up. Set chkptdir to something dummy. In order that the checkpointing code does not execute, set the chkptinterval to a very large value or some number greater than the number of iterations 
\end{DoxyEnumerate}
\item 

Create Global Dictionary -\/ Run the following on server with id 0. Assuming learntopics was run in the folder /tmp/corpus 
\begin{DoxyEnumerate}
\item 

{\ttfamily mkdir -\/p /tmp/corpus/global\_\-dict; cd /tmp/corpus/global\_\-dict;} 
\item 

{\ttfamily scp server\_\-i:/tmp/corpus/lda.dict.dump lda.dict.dump.i } where the variable 'i' is the same as the server\_\-id. 
\item 

Merge Dictionaries\par
 {\ttfamily Merge\_\-Dictionaries -\/-\/dictionaries=m -\/-\/dumpprefix=lda.dict.dump} 
\item 

{\ttfamily mkdir -\/p ../global; mkdir -\/p ../global/topic\_\-counts; cp lda.dict.dump ../global/;}  
\end{DoxyEnumerate}
\item 

Create a sharded Global Word-\/Topic Counts dump -\/ Run on every machine in the set-\/up 
\begin{DoxyEnumerate}
\item 

{\ttfamily mkdir -\/p /tmp/corpus/global\_\-top\_\-cnts; cd /tmp/corpus/global\_\-top\_\-cnts;} 
\item 

{\ttfamily scp server\_\-0:/tmp/corpus/global/lda.dict.dump lda.dict.dump.global} 
\item 

{\ttfamily Merge\_\-Topic\_\-Counts -\/-\/topics=$<$topics$>$ -\/-\/clientid=$<$server-\/id$>$ -\/-\/servers=$<$list-\/of-\/servers$>$ -\/-\/globaldictionary="lda.dict.dump.global"} 
\item 

{\ttfamily scp lda.ttc.dump server\_\-0:/tmp/corpus/global/topic\_\-counts/lda.ttc.dump.\$server-\/id} 
\end{DoxyEnumerate}
\item 

Copy the parameters dump file to global dump -\/ Run on server\_\-0 
\begin{DoxyEnumerate}
\item 

{\ttfamily cd /tmp/corpus; cp lda.par.dump global/topic\_\-counts/lda.par.dump} 
\end{DoxyEnumerate}
\item 

This completes training and the model is available on server\_\-0:/tmp/corpus/global 
\end{DoxyEnumerate}
\item 

Running Y!LDA is test mode: Run on server\_\-0. Assuming test corpus is in /tmp/test\_\-corpus 
\begin{DoxyEnumerate}
\item 

{\ttfamily cd /tmp/test\_\-corpus;}  
\item 

{\ttfamily cp -\/r ../corpus/global .} 
\item 

{\ttfamily learntopics -\/teststream -\/-\/dumpprefix=global/topic\_\-counts/lda -\/-\/numdumps=m -\/-\/dictionary=global/lda.dict.dump -\/-\/maxmemory=2048 -\/-\/topics=$<$topics$>$} 
\item 

cat all your documents, in the same format that 'formatter' expects, to the above command's stdin 
\end{DoxyEnumerate}
\end{DoxyEnumerate}
\end{DoxyEnumerate}